Goto

Collaborating Authors

 negative dependence



Small coresets via negative dependence: DPPs, linear statistics, and concentration

Neural Information Processing Systems

Determinantal point processes (DPPs) are random configurations of points with tunable negative dependence. Because sampling is tractable, DPPs are natural candidates for subsampling tasks, such as minibatch selection or coreset construction. A \emph{coreset} is a subset of a (large) training set, such that minimizing an empirical loss averaged over the coreset is a controlled replacement for the intractable minimization of the original empirical loss.Typically, the control takes the form of a guarantee that the average loss over the coreset approximates the total loss uniformly across the parameter space.Recent work has provided significant empirical support in favor of using DPPs to build randomized coresets, coupled with interesting theoretical results that are suggestive but leave some key questions unanswered.In particular, the central question of whether the cardinality of a DPP-based coreset is fundamentally smaller than one based on independent sampling remained open.In this paper, we answer this question in the affirmative, demonstrating that \emph{DPPs can provably outperform independently drawn coresets}. In this vein, we contribute a conceptual understanding of coreset loss as a \emph{linear statistic} of the (random) coreset. We leverage this structural observation to connect the coresets problem to a more general problem of concentration phenomena for linear statistics of DPPs, wherein we obtain \emph{effective concentration inequalities that extend well-beyond the state-of-the-art}, encompassing general non-projection, even non-symmetric kernels. The latter have been recently shown to be of interest in machine learning beyond coresets, but come with a limited theoretical toolbox, to the extension of which our result contributes. Finally, we are also able to address the coresets problem for vector-valued objective functions, a novelty in the coresets literature.



Small coresets via negative dependence: DPPs, linear statistics, and concentration

Neural Information Processing Systems

Determinantal point processes (DPPs) are random configurations of points with tunable negative dependence. Because sampling is tractable, DPPs are natural candidates for subsampling tasks, such as minibatch selection or coreset construction. A \emph{coreset} is a subset of a (large) training set, such that minimizing an empirical loss averaged over the coreset is a controlled replacement for the intractable minimization of the original empirical loss.Typically, the control takes the form of a guarantee that the average loss over the coreset approximates the total loss uniformly across the parameter space.Recent work has provided significant empirical support in favor of using DPPs to build randomized coresets, coupled with interesting theoretical results that are suggestive but leave some key questions unanswered.In particular, the central question of whether the cardinality of a DPP-based coreset is fundamentally smaller than one based on independent sampling remained open.In this paper, we answer this question in the affirmative, demonstrating that \emph{DPPs can provably outperform independently drawn coresets}. In this vein, we contribute a conceptual understanding of coreset loss as a \emph{linear statistic} of the (random) coreset. We leverage this structural observation to connect the coresets problem to a more general problem of concentration phenomena for linear statistics of DPPs, wherein we obtain \emph{effective concentration inequalities that extend well-beyond the state-of-the-art}, encompassing general non-projection, even non-symmetric kernels.


Small coresets via negative dependence: DPPs, linear statistics, and concentration

Bardenet, Rémi, Ghosh, Subhroshekhar, Simon-Onfroy, Hugo, Tran, Hoang-Son

arXiv.org Machine Learning

Determinantal point processes (DPPs) are random configurations of points with tunable negative dependence. Because sampling is tractable, DPPs are natural candidates for subsampling tasks, such as minibatch selection or coreset construction. A \emph{coreset} is a subset of a (large) training set, such that minimizing an empirical loss averaged over the coreset is a controlled replacement for the intractable minimization of the original empirical loss. Typically, the control takes the form of a guarantee that the average loss over the coreset approximates the total loss uniformly across the parameter space. Recent work has provided significant empirical support in favor of using DPPs to build randomized coresets, coupled with interesting theoretical results that are suggestive but leave some key questions unanswered. In particular, the central question of whether the cardinality of a DPP-based coreset is fundamentally smaller than one based on independent sampling remained open. In this paper, we answer this question in the affirmative, demonstrating that \emph{DPPs can provably outperform independently drawn coresets}. In this vein, we contribute a conceptual understanding of coreset loss as a \emph{linear statistic} of the (random) coreset. We leverage this structural observation to connect the coresets problem to a more general problem of concentration phenomena for linear statistics of DPPs, wherein we obtain \emph{effective concentration inequalities that extend well-beyond the state-of-the-art}, encompassing general non-projection, even non-symmetric kernels. The latter have been recently shown to be of interest in machine learning beyond coresets, but come with a limited theoretical toolbox, to the extension of which our result contributes. Finally, we are also able to address the coresets problem for vector-valued objective functions, a novelty in the coresets literature.


A Family of Distributions of Random Subsets for Controlling Positive and Negative Dependence

Kawashima, Takahiro, Hino, Hideitsu

arXiv.org Machine Learning

Positive and negative dependence are fundamental concepts that characterize the attractive and repulsive behavior of random subsets. Although some probabilistic models are known to exhibit positive or negative dependence, it is challenging to seamlessly bridge them with a practicable probabilistic model. In this study, we introduce a new family of distributions, named the discrete kernel point process (DKPP), which includes determinantal point processes and parts of Boltzmann machines. We also develop some computational methods for probabilistic operations and inference with DKPPs, such as calculating marginal and conditional probabilities and learning the parameters. Our numerical experiments demonstrate the controllability of positive and negative dependence and the effectiveness of the computational methods for DKPPs.


An Interpretable Determinantal Choice Model for Subset Selection

Aarts, Sander, Shmoys, David B., Coy, Alex

arXiv.org Artificial Intelligence

Understanding how subsets of items are chosen from offered sets is critical to assortment planning, wireless network planning, and many other applications. There are two seemingly unrelated subset choice models that capture dependencies between items: intuitive and interpretable random utility models; and tractable determinantal point processes (DPPs). This paper connects the two. First, all DPPs are shown to be random utility models. Next, a determinantal choice model that enjoys the best of both worlds is specified; the model is shown to subsume logistic regression when dependence is minimal, and MNL when dependence is maximally negative. This makes the model interpretable, while retaining the tractability of DPPs. A simulation study verifies that the model can learn a continuum of negative dependencies from data, and an applied study using original experimental data produces novel insights on wireless interference in LoRa networks.


Uphill Roads to Variational Tightness: Monotonicity and Monte Carlo Objectives

Mattei, Pierre-Alexandre, Frellsen, Jes

arXiv.org Machine Learning

We revisit the theory of importance weighted variational inference (IWVI), a promising strategy for learning latent variable models. IWVI uses new variational bounds, known as Monte Carlo objectives (MCOs), obtained by replacing intractable integrals by Monte Carlo estimates -- usually simply obtained via importance sampling. Burda, Grosse and Salakhutdinov (2016) showed that increasing the number of importance samples provably tightens the gap between the bound and the likelihood. Inspired by this simple monotonicity theorem, we present a series of nonasymptotic results that link properties of Monte Carlo estimates to tightness of MCOs. We challenge the rationale that smaller Monte Carlo variance leads to better bounds. We confirm theoretically the empirical findings of several recent papers by showing that, in a precise sense, negative correlation reduces the variational gap. We also generalise the original monotonicity theorem by considering non-uniform weights. We discuss several practical consequences of our theoretical results. Our work borrows many ideas and results from the theory of stochastic orders.


On the Relationship Between Probabilistic Circuits and Determinantal Point Processes

Zhang, Honghua, Holtzen, Steven, Broeck, Guy Van den

arXiv.org Artificial Intelligence

Scaling probabilistic models to large realistic problems and datasets is a key challenge in machine learning. Central to this effort is the development of tractable probabilistic models (TPMs): models whose structure guarantees efficient probabilistic inference algorithms. The current landscape of TPMs is fragmented: there exist various kinds of TPMs with different strengths and weaknesses. Two of the most prominent classes of TPMs are determinantal point processes (DPPs) and probabilistic circuits (PCs). This paper provides the first systematic study of their relationship. We propose a unified analysis and shared language for discussing DPPs and PCs. Then we establish theoretical barriers for the unification of these two families, and prove that there are cases where DPPs have no compact representation as a class of PCs. We close with a perspective on the central problem of unifying these tractable models.